-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add blob.open() for file-like I/O #385
Conversation
Docstrings are WIP and system tests should be expanded before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some feedback that I want to talk through in implementation. Otherwise this is looking good. I'll review the design document as well.
|
||
# Upload chunks. The SlidingBuffer class will manage seek position. | ||
for _ in range(num_chunks): | ||
upload.transmit_next_chunk(transport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if partial success occurs does transmit_next_chunk handle the retry from the remote offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define "partial success"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCS may receive a subset of bytes from client upload, leaving the client and server next byte offset unaligned. A request to get current GCS offset can be made per Step Checking the status of a resumable upload.
If the client doesn't attempt to recover from this state, the inconsistent byte alignment will cause a 400 error which is not recoverable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. With the changes in this recent update, sliding buffer implements seek() so this feature is as supported as it's going to be in python-storage. If we want further support we need to flesh it out in resumable media.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do that for recovering in a mismatch (address it in resumable media repo). Could you file a tracking issue in that repo?
Added docstrings, seek functionality in the sliding buffer, and other features. PTAL. Remaining questions: can we set up a system test to test end-to-end the error-on-chunk-upload-and-retry behavior? And, even though blob.download methods do not populate the blob generation, can we fetch that generation information from download metadata somehow in order to implement generation lock? (alternative: force blob.reload() before d/l if generation is not populated - race condition possible). |
Remaining questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over LGTM, I have a few nits.
Also retrying on mismatched offset may be better handled / addressed in resumable media than in this wrapper so we could table it for now and open tracking issues in resumable media repo. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚢
Fixes #29